mc sample
Partially Observable Gaussian Process Network and Doubly Stochastic Variational Inference
Kiroriwal, Saksham, Pfrommer, Julius, Beyerer, Jürgen
To reduce the curse of dimensionality for Gaussian processes (GP), they can be decomposed into a Gaussian Process Network (GPN) of coupled subprocesses with lower dimensionality. In some cases, intermediate observations are available within the GPN. However, intermediate observations are often indirect, noisy, and incomplete in most real-world systems. This work introduces the Partially Observable Gaussian Process Network (POGPN) to model real-world process networks. We model a joint distribution of latent functions of subprocesses and make inferences using observations from all subprocesses. POGPN incorporates observation lenses (observation likelihoods) into the well-established inference method of deep Gaussian processes. We also introduce two training methods for POPGN to make inferences on the whole network using node observations. The application to benchmark problems demonstrates how incorporating partial observations during training and inference can improve the predictive performance of the overall network, offering a promising outlook for its practical application.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.05)
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
Kazemnejad, Amirhossein, Aghajohari, Milad, Portelance, Eva, Sordoni, Alessandro, Reddy, Siva, Courville, Aaron, Roux, Nicolas Le
Large language models (LLMs) are increasingly applied to complex reasoning tasks that require executing several complex steps before receiving any reward. Properly assigning credit to these steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a state-of-the-art reinforcement learning (RL) algorithm used for LLM finetuning, employs value networks to tackle credit assignment. However, value networks face challenges in predicting the expected cumulative rewards accurately in complex reasoning tasks, often leading to high-variance updates and suboptimal performance. In this work, we systematically evaluate the efficacy of value networks and reveal their significant shortcomings in reasoning-heavy LLM tasks, showing that they barely outperform a random baseline when comparing alternative steps. To address this, we propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates, bypassing the need for large value networks. Our method consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets with fewer gradient updates (up to 9x), less wall-clock time (up to 3.0x). These results emphasize the importance of accurate credit assignment in RL finetuning of LLM and demonstrate VinePPO's potential as a superior alternative.
- Europe > Austria > Vienna (0.14)
- North America > Canada > Quebec > Montreal (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (10 more...)
- Education (0.67)
- Leisure & Entertainment (0.46)
Uncertainty-Informed Volume Visualization using Implicit Neural Representation
Saklani, Shanu, Goel, Chitwan, Bansal, Shrey, Wang, Zhe, Dutta, Soumya, Athawale, Tushar M., Pugmire, David, Johnson, Christopher R.
The increasing adoption of Deep Neural Networks (DNNs) has led to their application in many challenging scientific visualization tasks. While advanced DNNs offer impressive generalization capabilities, understanding factors such as model prediction quality, robustness, and uncertainty is crucial. These insights can enable domain scientists to make informed decisions about their data. However, DNNs inherently lack ability to estimate prediction uncertainty, necessitating new research to construct robust uncertainty-aware visualization techniques tailored for various visualization tasks. In this work, we propose uncertainty-aware implicit neural representations to model scalar field data sets effectively and comprehensively study the efficacy and benefits of estimated uncertainty information for volume visualization tasks. We evaluate the effectiveness of two principled deep uncertainty estimation techniques: (1) Deep Ensemble and (2) Monte Carlo Dropout (MCDropout). These techniques enable uncertainty-informed volume visualization in scalar field data sets. Our extensive exploration across multiple data sets demonstrates that uncertainty-aware models produce informative volume visualization results. Moreover, integrating prediction uncertainty enhances the trustworthiness of our DNN model, making it suitable for robustly analyzing and visualizing real-world scientific volumetric data sets.
- Asia > India > Uttar Pradesh > Kanpur (0.04)
- North America > United States > Utah (0.04)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Energy (0.46)
- Government > Regional Government > North America Government > United States Government (0.46)
When Monte-Carlo Dropout Meets Multi-Exit: Optimizing Bayesian Neural Networks on FPGA
Fan, Hongxiang, Chen, Hao, Castelli, Liam, Que, Zhiqiang, Li, He, Long, Kenneth, Luk, Wayne
Bayesian Neural Networks (BayesNNs) have demonstrated their capability of providing calibrated prediction for safety-critical applications such as medical imaging and autonomous driving. However, the high algorithmic complexity and the poor hardware performance of BayesNNs hinder their deployment in real-life applications. To bridge this gap, this paper proposes a novel multi-exit Monte-Carlo Dropout (MCD)-based BayesNN that achieves well-calibrated predictions with low algorithmic complexity. To further reduce the barrier to adopting BayesNNs, we propose a transformation framework that can generate FPGA-based accelerators for multi-exit MCD-based BayesNNs. Several novel optimization techniques are introduced to improve hardware performance. Our experiments demonstrate that our auto-generated accelerator achieves higher energy efficiency than CPU, GPU, and other state-of-the-art hardware implementations.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > United Kingdom > England > Greater London > London (0.05)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Information Technology (0.88)
- Health & Medicine > Diagnostic Medicine > Imaging (0.34)
Alpha-divergence Variational Inference Meets Importance Weighted Auto-Encoders: Methodology and Asymptotics
Daudel, Kamélia, Benton, Joe, Shi, Yuyang, Doucet, Arnaud
Several algorithms involving the Variational R\'enyi (VR) bound have been proposed to minimize an alpha-divergence between a target posterior distribution and a variational distribution. Despite promising empirical results, those algorithms resort to biased stochastic gradient descent procedures and thus lack theoretical guarantees. In this paper, we formalize and study the VR-IWAE bound, a generalization of the Importance Weighted Auto-Encoder (IWAE) bound. We show that the VR-IWAE bound enjoys several desirable properties and notably leads to the same stochastic gradient descent procedure as the VR bound in the reparameterized case, but this time by relying on unbiased gradient estimators. We then provide two complementary theoretical analyses of the VR-IWAE bound and thus of the standard IWAE bound. Those analyses shed light on the benefits or lack thereof of these bounds. Lastly, we illustrate our theoretical claims over toy and real-data examples.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
- Asia > Middle East > Jordan (0.04)
- (2 more...)
Towards Practical Preferential Bayesian Optimization with Skew Gaussian Processes
Takeno, Shion, Nomura, Masahiro, Karasuyama, Masayuki
We study preferential Bayesian optimization (BO) where reliable feedback is limited to pairwise comparison called duels. An important challenge in preferential BO, which uses the preferential Gaussian process (GP) model to represent flexible preference structure, is that the posterior distribution is a computationally intractable skew GP. The most widely used approach for preferential BO is Gaussian approximation, which ignores the skewness of the true posterior. Alternatively, Markov chain Monte Carlo (MCMC) based preferential BO is also proposed. In this work, we first verify the accuracy of Gaussian approximation, from which we reveal the critical problem that the predictive probability of duels can be inaccurate. This observation motivates us to improve the MCMC-based estimation for skew GP, for which we show the practical efficiency of Gibbs sampling and derive the low variance MC estimator. However, the computational time of MCMC can still be a bottleneck in practice. Towards building a more practical preferential BO, we develop a new method that achieves both high computational efficiency and low sample complexity, and then demonstrate its effectiveness through extensive numerical experiments.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.51)
Massively Scaling Heteroscedastic Classifiers
Collier, Mark, Jenatton, Rodolphe, Mustafa, Basil, Houlsby, Neil, Berent, Jesse, Kokiopoulou, Effrosyni
Heteroscedastic classifiers, which learn a multivariate Gaussian distribution over prediction logits, have been shown to perform well on image classification problems with hundreds to thousands of classes. However, compared to standard classifiers, they introduce extra parameters that scale linearly with the number of classes. This makes them infeasible to apply to larger-scale problems. In addition heteroscedastic classifiers introduce a critical temperature hyperparameter which must be tuned. We propose HET-XL, a heteroscedastic classifier whose parameter count when compared to a standard classifier scales independently of the number of classes. In our large-scale settings, we show that we can remove the need to tune the temperature hyperparameter, by directly learning it on the training data. On large image classification datasets with up to 4B images and 30k classes our method requires 14X fewer additional parameters, does not require tuning the temperature on a held-out set and performs consistently better than the baseline heteroscedastic classifier. HET-XL improves ImageNet 0-shot classification in a multimodal contrastive learning setup which can be viewed as a 3.5 billion class classification problem.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.46)